Search CORE

A bio-inspired feature extraction for robust speech recognition

Author: BCJ Moore
BR Glasberg
BS Atal
BS Atal
C Nadeu
DL Wang
H Beigi
H Hermansky
H Hirsch
J Garofolo
JP Martens
L Rabiner
LM Van Immerseel
M Unokia
R Meddis
RD Patterson
RF Lyon
S Bleeck
S Furui
S Young
SB Davis
T Irino
T Irino
Y Zouhir
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

The Dynamic Range Paradox: A Central Auditory Model of Intensity Change Detection

Author: AJ Oxenham
Andrew J.R. Simpson
B Wen
BCJ Moore
BR Glasberg
BR Glasberg
CJ Plack
EF Evans
EH Weber
FJ Gallun
GA Miller
H Fletcher
H Fletcher
H Levitt
I Dean
I Dean
JB Allen
Joshua D. Reiss
LC Parra
M Florentine
M Pienkowski
M Wojtczak
Manuel S. Malmierca
MB Sachs
NC Rabinowitz
NF Viemeister
NF Viemeister
NI Durlach
R Riesz
S Buus
T Dau
T Dau
TR Agus
WJ McGill
WJ McGill
WS Hellman
WS Hellman
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 28/02/2013
Field of study

This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.Engineering and Physical Sciences Research Council Doctoral Training Account (www.epsrc.ac.uk) studentship at Queen Mary University of London

Queen Mary Research Online

FigShare

Acoustic Intensity Causes Perceived Changes in Arousal Levels in Music: An Experimental Investigation

Author: A Brewster
A Gabrielsson
BR Glasberg
CL Krumhansl
CVO Witvliet
D Cabrera
E Bigand
E Schubert
E Schubert
E Zwicker
Emery Schubert
F Bailes
Freya Bailes
J Ollen
JA Russell
JA Russell
JA Sloboda
JC Gower
JG Neuhoff
JM Geringer
LL Balkwill
M Leman
Mark W. Greenlee
MM Bradley
NF Viemeister
NPM Todd
P Evans
P Juslin
PN Juslin
Roger T. Dean
RT Dean
RT Dean
S Dubnov
S Khalfa
S Namba
W Mertens
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

Listener perceptions of changes in the arousal expressed by classical music have been found to correlate with changes in sound intensity/loudness over time. This study manipulated the intensity profiles of different pieces of music in order to test the causal nature of this relationship. Listeners (N = 38) continuously rated their perceptions of the arousal expressed by each piece. An extract from Dvorak's Slavonic Dance Opus 46 No 1 was used to create a variant in which the direction of change in intensity was inverted, while other features were retained. Even though it was only intensity that was inverted, perceived arousal was also inverted. The original intensity profile was also superimposed on three new pieces of music. The time variation in the perceived arousal of all pieces was similar to their intensity profile. Time series analyses revealed that intensity variation was a major influence on the arousal perception in all pieces, in spite of their stylistic diversity

CiteSeerX

Western Sydney ResearchDirect

White Rose Research Online

Real-Time Contrast Enhancement to Improve Speech Recognition

Author: A Sidwell
AJ Lotto
AJ Lotto
AL Yuille
AM Simpson
B Delgutte
B Leshowitz
B Lindblom
BA Franck
BCJ Moore
BCJ Moore
BE Lindblom
BR Glasberg
CA Fowler
E Ozimek
E Zwicker
F Wightman
GA Miller
GA Studebaker
J Hillenbrand
J Lyzenga
JM Festen
JM Kates
Joshua M. Alexander
JR Dubno
Jun Yan
JW Horst
Keith R. Kluender
KR Kluender
LL Holt
MA Stone
MA Stone
MA Stone
MD Wang
MR Leek
P Bonding
P Stelmachowicz
Q Summerfield
Q Summerfield
Rick L. Jenison
RS Tyler
S Amari
SEG Öhman
T Baer
T Baer
T Houtgast
T Kohonen
TM van Veen
VA Mann
VA Mann
W Dreschler
W Dreschler
W Jesteadt
YM Cheng
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

An algorithm that operates in real-time to enhance the salient features of speech is described and its efficacy is evaluated. The Contrast Enhancement (CE) algorithm implements dynamic compressive gain and lateral inhibitory sidebands across channels in a modified winner-take-all circuit, which together produce a form of suppression that sharpens the dynamic spectrum. Normal-hearing listeners identified spectrally smeared consonants (VCVs) and vowels (hVds) in quiet and in noise. Consonant and vowel identification, especially in noise, were improved by the processing. The amount of improvement did not depend on the degree of spectral smearing or talker characteristics. For consonants, when results were analyzed according to phonetic feature, the most consistent improvement was for place of articulation. This is encouraging for hearing aid applications because confusions between consonants differing in place are a persistent problem for listeners with sensorineural hearing loss

CiteSeerX

Purdue E-Pubs

The Frequency Following Response (FFR) May Reflect Pitch-Bearing Information But is Not a Direct Representation of Pitch

The frequency following response (FFR), a scalp-recorded measure of phase-locked brainstem activity, is often assumed to reflect the pitch of sounds as perceived by humans. In two experiments, we investigated the characteristics of the FFR evoked by complex tones. FFR waveforms to alternating-polarity stimuli were averaged for each polarity and added, to enhance envelope, or subtracted, to enhance temporal fine structure information. In experiment 1, frequency-shifted complex tones, with all harmonics shifted by the same amount in Hertz, were presented diotically. Only the autocorrelation functions (ACFs) of the subtraction-FFR waveforms showed a peak at a delay shifted in the direction of the expected pitch shifts. This expected pitch shift was also present in the ACFs of the output of an auditory nerve model. In experiment 2, the components of a harmonic complex with harmonic numbers 2, 3, and 4 were presented either to the same ear (“mono”) or the third harmonic was presented contralaterally to the ear receiving the even harmonics (“dichotic”). In the latter case, a pitch corresponding to the missing fundamental was still perceived. Monaural control conditions presenting only the even harmonics (“2 + 4”) or only the third harmonic (“3”) were also tested. Both the subtraction and the addition waveforms showed that (1) the FFR magnitude spectra for “dichotic” were similar to the sum of the spectra for the two monaural control conditions and lacked peaks at the fundamental frequency and other distortion products visible for “mono” and (2) ACFs for “dichotic” were similar to those for “2 + 4” and dissimilar to those for “mono.” The results indicate that the neural responses reflected in the FFR preserve monaural temporal information that may be important for pitch, but provide no evidence for any additional processing over and above that already present in the auditory periphery, and do not directly represent the pitch of dichotic stimuli

The University of Manchester - Institutional Repository

Lancaster E-Prints

Spike-Timing-Based Computation in Sound Localization

Author: A Brand
BJ Hefti
BR Glasberg
C Faller
C Huetz
C Lorenzi
C Lorenzi
D Wang
Dan F. M. Goodman
DFM Goodman
GF Kuhn
H Asari
H Wagner
HS Colburn
J Blauert
J Breebaart
J Liu
JA Macdonald
JC Makous
JC Middlebrooks
JC Middlebrooks
JF Culling
Karl J. Friston
LA Jeffress
LO Trussell
LO Trussell
M Slaney
M Wehr
MC Reed
MM Van Wanrooij
MSA Zilany
N Fourcaud-Trocme
N Roman
NB Cant
NI Durlach
NS Harper
P Joris
P Zahorik
PM Hofman
PX Joris
PX Joris
PX Joris
PX Joris
R Brette
R Jolivet
RM Stern
Romain Brette
RY Litovsky
S Furukawa
S Huggenberger
SJ Sterbing
SK Thompson
T Hromádka
TC Yin
VR Algazi
W Gaik
W Lindemann
WE Kock
Y Zhou
Publication venue: Public Library of Science
Publication date: 08/10/2010
Field of study

Spike timing is precise in the auditory system and it has been argued that it conveys information about auditory stimuli, in particular about the location of a sound source. However, beyond simple time differences, the way in which neurons might extract this information is unclear and the potential computational advantages are unknown. The computational difficulty of this task for an animal is to locate the source of an unexpected sound from two monaural signals that are highly dependent on the unknown source signal. In neuron models consisting of spectro-temporal filtering and spiking nonlinearity, we found that the binaural structure induced by spatialized sounds is mapped to synchrony patterns that depend on source location rather than on source signal. Location-specific synchrony patterns would then result in the activation of location-specific assemblies of postsynaptic neurons. We designed a spiking neuron model which exploited this principle to locate a variety of sound sources in a virtual acoustic environment using measured human head-related transfer functions. The model was able to accurately estimate the location of previously unknown sounds in both azimuth and elevation (including front/back discrimination) in a known acoustic environment. We found that multiple representations of different acoustic environments could coexist as sets of overlapping neural assemblies which could be associated with spatial locations by Hebbian learning. The model demonstrates the computational relevance of relative spike timing to extract spatial information about sources independently of the source signal

Spiral - Imperial College Digital Repository

Efficient Coding and Statistically Optimal Weighting of Covariance among Acoustic Attributes in Novel Sounds

Author: A Caclin
A Cristia
A Kohn
AA Faisal
AA Stocker
AJ Bell
AL Fairhall
AR Girshick
B McMurray
BA Olshausen
BA Olshausen
BA Olshausen
BH Repp
BR Glasberg
C McCollough
CE Stilp
CE Stilp
CE Stilp
Christian E. Stilp
CWG Clifford
CWG Clifford
D Alais
D Bendor
D Bendor
D Kersten
D Kersten
David S. Vicario
DC Knill
DL Barbour
DO Hebb
E Oja
EJA Turnham
EP Simoncelli
F Attneave
F Opolko
FH Durgin
G Chechik
G Chechik
HB Barlow
HB Barlow
HB Barlow
HB Helbig
HM Sussman
HM Sussman
I Nelken
J Maye
J Maye
J-M Xu
JA Movshon
JC Toscano
JK Chapin
JL Anderson
JM Hillis
JM Hillis
Keith R. Kluender
KP Körding
KP Körding
KR Kluender
KR Kluender
KR Kluender
KR Kluender
KR Kluender
L Lisker
LL Holt
MO Ernst
MO Ernst
O Schwartz
PC Delattre
PK Stanton
RD Patterson
SC Sullivan
T Lu
T Lu
TD Sanger
TJ Sejnowski
TJ Sejnowski
WE Vinje
WE Vinje
WS Geisler
X Wang
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

To the extent that sensorineural systems are efficient, redundancy should be extracted to optimize transmission of information, but perceptual evidence for this has been limited. Stilp and colleagues recently reported efficient coding of robust correlation (r = .97) among complex acoustic attributes (attack/decay, spectral shape) in novel sounds. Discrimination of sounds orthogonal to the correlation was initially inferior but later comparable to that of sounds obeying the correlation. These effects were attenuated for less-correlated stimuli (r = .54) for reasons that are unclear. Here, statistical properties of correlation among acoustic attributes essential for perceptual organization are investigated. Overall, simple strength of the principal correlation is inadequate to predict listener performance. Initial superiority of discrimination for statistically consistent sound pairs was relatively insensitive to decreased physical acoustic/psychoacoustic range of evidence supporting the correlation, and to more frequent presentations of the same orthogonal test pairs. However, increased range supporting an orthogonal dimension has substantial effects upon perceptual organization. Connectionist simulations and Eigenvalues from closed-form calculations of principal components analysis (PCA) reveal that perceptual organization is near-optimally weighted to shared versus unshared covariance in experienced sound distributions. Implications of reduced perceptual dimensionality for speech perception and plausible neural substrates are discussed

CiteSeerX